Inner Loop Optimizations in Mapping Single Threaded Programs to Hardware
نویسنده
چکیده
In the context of mapping high-level algorithms to hardware, we consider the basic problem of generating an efficient hardware implementation of a single threaded program, in particular, that of an inner loop. We describe a control-flow mechanism which provides dynamic loop-pipelining capability in hardware, so that multiple iterations of an arbitrary inner loop can be made simultaneously active in the generated hardware, We study the impact of this loop-pipelining scheme in conjunction with source-level loop-unrolling. In particular, we apply this technique to some common loop kernels: regular kernels such as the fast-fourier transform and matrix multiplication, as well as an example of an inner loop whose body has branching. The resulting resulting hardware descriptions are synthesized to an FPGA target, and then characterized for performance and resource utilization. We observe that the use of dynamic loop-pipelining mechanism alone typically results in a significant improvements in the performance of the hardware. If the loop is statically unrolled and if loop-pipelining is applied to the unrolled program, then the performance improvement is still substantial. When dynamic loop pipelining is used in conjunction with static loop unrolling, the improvement in performance ranges from 6X to 20X (in terms of number of clock cycles needed for the computation) across the loop kernels that we have studied. These optimizations do have a hardware overhead, but, in spite of this, we observe that the joint use of these loop optimizations not only improves performance, but also the performance/cost ratio of the resulting hardware.
منابع مشابه
On the Value and Limits of Multi-level Energy Consumption Static Analysis for Deeply Embedded Single and Multi-threaded Programs
There is growing interest in lowering the energy consumption of computation. Energy transparency is a concept that makes a program’s energy consumption visible from software to hardware through the different system layers. Such transparency can enable energy optimizations at each layer and between layers, and help both programmers and operating systems make energy aware decisions. The common me...
متن کاملPartial Redundancy Elimination for Multi-threaded Programs
Multi-threaded programs have many applications which are widely used such as operating systems. Analyzing multi-threaded programs differs from sequential ones; the main feature is that many threads execute at the same time. The effect of all other running threads must be taken in account. Partial redundancy elimination is among the most powerful compiler optimizations: it performs loop-invarian...
متن کاملSpecifying and executing optimizations for generalized control flow graphs
Compiler optimizations, usually expressed as rewrites on program graphs, are a core part of modern compilers. However, even production compilers have bugs, and these bugs are difficult to detect and resolve. In this paper we present Morpheus, a domain-specific language for formal specification of program transformations, and describe its executable semantics. The fundamental approach of Morpheu...
متن کاملA Proof Theory for Loop-Parallelizing Transformations
The microprocessor industry has embraced multicore architectures as the new dominant design paradigm. Harnessing the full power of such computers requires writing multithreaded programs, but regardless of whether one is writing a program from scratch or porting an existing single-threaded program, concurrency is hard to implement correctly and often reduces the legibility and maintainability of...
متن کاملLecture Notes on Loop Optimizations
Optimizing loops is particularly important in compilation, since loops (and in particular the inner loops) account for much of the executions times of many programs. Since tail-recursive functions are usually also turned into loops, the importance of loop optimizations is further magnified. In this lecture we will discuss two main ones: hoisting loop-invariant computation out of a loop, and opt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1411.0863 شماره
صفحات -
تاریخ انتشار 2014